Learning Complex Neural Network Policies with Trajectory Optimization
نویسندگان
چکیده
Direct policy search methods offer the promise of automatically learning controllers for complex, high-dimensional tasks. However, prior applications of policy search often required specialized, low-dimensional policy classes, limiting their generality. In this work, we introduce a policy search algorithm that can directly learn high-dimensional, general-purpose policies, represented by neural networks. We formulate the policy search problem as an optimization over trajectory distributions, alternating between optimizing the policy to match the trajectories, and optimizing the trajectories to match the policy and minimize expected cost. Our method can learn policies for complex tasks such as bipedal push recovery and walking on uneven terrain, while outperforming prior methods.
منابع مشابه
Guided Policy Search
Direct policy search can effectively scale to high-dimensional systems, but complex policies with hundreds of parameters often present a challenge for such methods, requiring numerous samples and often falling into poor local optima. We present a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima. We show how differential dynam...
متن کاملLearning Dynamic Manipulation Skills under Unknown Dynamics with Guided Policy Search
Planning and trajectory optimization can readily be used for kinematic control of robotic manipulation. However, planning dynamic motor skills requires a detailed physical simulation, and some aspects of the task, such as contacts, are very difficult to simulate with enough accuracy for dynamic manipulation. Alternatively, manipulation skills can be learned from experience, allowing them to def...
متن کاملLearning a Structured Neural Network Policy for a Hopping Task
In this work, we attempt to learn a neural network policy for dynamic, underactuated locomotion tasks. Learning a policy for such a task is non trivial due to the dynamic, fast changing, non linear and contact rich dynamics of this task. We use existing trajectory optimization techniques to optimize a set of policies. For this, we present a method that allows to learn contact rich dynamics for ...
متن کاملDeep Learning Quadcopter Control via Risk-Aware Active Learning
Modern optimization-based approaches to control increasingly allow automatic generation of complex behavior from only a model and an objective. Recent years has seen growing interest in fast solvers to also allow real-time operation on robots, but the computational cost of such trajectory optimization remains prohibitive for many applications. In this paper we examine a novel deep neural networ...
متن کاملLearning Neural Network Policies with Guided Policy Search under Unknown Dynamics
We present a policy search method that uses iteratively refitted local linear models to optimize trajectory distributions for large, continuous problems. These trajectory distributions can be used within the framework of guided policy search to learn policies with an arbitrary parameterization. Our method fits time-varying linear dynamics models to speed up learning, but does not rely on learni...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014